AITopics | image text

Collaborating Authors

image text

Information about AI from the News, Publications, and Conferences

Automatic Classification – Tagging and Summarization – Customizable Filtering and Analysis

If you are looking for an answer to the question What is Artificial Intelligence? and you only have a minute, then here's the definition the Association for the Advancement of Artificial Intelligence offers on its home page: "the scientific understanding of the mechanisms underlying thought and intelligent behavior and their embodiment in machines."

However, if you are fortunate enough to have more than a minute, then please get ready to embark upon an exciting journey exploring AI (but beware, it could last a lifetime) …

Hierarchical Visual Feature Aggregation for OCR-Free Document Understanding

Neural Information Processing SystemsFeb-17-2026, 21:43:11 GMT

We present a novel OCR-free document understanding framework based on pre-trained Multimodal Large Language Models (MLLMs).

artificial intelligence, machine learning, natural language, (16 more...)

Neural Information Processing Systems

Country: Asia > South Korea > Seoul > Seoul (0.04)

Genre: Research Report > Experimental Study (0.93)

Industry:

Education (1.00)
Information Technology (0.93)

Technology:

Information Technology > Sensing and Signal Processing > Image Processing (1.00)
Information Technology > Artificial Intelligence > Natural Language (1.00)
Information Technology > Artificial Intelligence > Machine Learning (1.00)
Information Technology > Artificial Intelligence > Vision > Image Understanding (0.66)

Add feedback

ContrastiveLanguage-ImagePre-Trainingwith KnowledgeGraphs-SupplementaryMaterial

Neural Information Processing SystemsFeb-10-2026, 18:59:48 GMT

In this way, the modality of the concept in different13 triplets or training batches can be different, and the triplet forms can include image/text, relation,14 image/text. Thenodes15 are presented in a bounding box and the edges are represented by word tokens, e.g., standing on.16 For each input modality in the training data, we adopt a unified processing procedure to make it23 possible for batch training. Specifically, the length of the image is set as 16x16 and the length of24 thetextissetas77. VE task is similar to VQA, which also takes image-text pair as input.

artificial intelligence, betterviewincolor, relation, (13 more...)

Neural Information Processing Systems

Technology: Information Technology > Artificial Intelligence (0.90)

Add feedback

An Inverse Scaling Law for CLIP Training

Neural Information Processing SystemsDec-26-2025, 10:08:52 GMT

CLIP, one of the pioneering foundation models that connect images and text, has enabled many recent breakthroughs in computer vision. However, its associated training cost is prohibitively high, imposing a significant barrier to its widespread exploration. In this paper, we present a surprising finding that there exists an inverse scaling law for CLIP training, whereby the larger the image/text encoders used, the shorter the sequence length of image/text tokens that can be applied in training. Moreover, we showcase that the strategy for reducing image/text token length plays a crucial role in determining the quality of this scaling law.As a result of this finding, we are able to successfully train CLIP even with limited computational resources. For example, using 8 A100 GPUs, our CLIP models achieve zero-shot top-1 ImageNet-1k accuracies of 63.2% in ~2 days, 67.8% in ~3 days, and 69.3% in ~4 days. Our method also works well when scaling up --- with G/14, we register a new record of 83.0% ImageNet-1k zero-shot accuracy, and meanwhile accelerate the training by ~33x compared to its OpenCLIP counterpart.By reducing the computation barrier associated with CLIP, we hope to inspire more research in this field, particularly from academics.

clip training, inverse scaling law, name change, (4 more...)

Neural Information Processing Systems

Technology:

Information Technology > Artificial Intelligence > Natural Language > Large Language Model (0.50)
Information Technology > Artificial Intelligence > Machine Learning (0.43)

Add feedback

Hierarchical Visual Feature Aggregation for OCR-Free Document Understanding

Neural Information Processing SystemsOct-10-2025, 15:23:54 GMT

We present a novel OCR-free document understanding framework based on pre-trained Multimodal Large Language Models (MLLMs).

corresponding text, dataset, image text, (12 more...)

Neural Information Processing Systems

Country: Asia > South Korea > Seoul > Seoul (0.04)

Genre: Research Report > Experimental Study (0.93)

Industry:

Education (1.00)
Information Technology (0.93)

Technology:

Information Technology > Sensing and Signal Processing > Image Processing (1.00)
Information Technology > Artificial Intelligence > Natural Language (1.00)
Information Technology > Artificial Intelligence > Machine Learning (1.00)
Information Technology > Artificial Intelligence > Vision > Image Understanding (0.66)

Add feedback

An Inverse Scaling Law for CLIP Training

Neural Information Processing SystemsJan-19-2025, 16:23:52 GMT

clip training, image text, inverse scaling law, (1 more...)

Neural Information Processing Systems

Technology: Information Technology > Artificial Intelligence > Natural Language > Large Language Model (0.32)

Add feedback

BabyAI++: Towards Grounded-Language Learning beyond Memorization

Cao, Tianshi, Wang, Jingkang, Zhang, Yining, Manivasagam, Sivabalan

arXiv.org Artificial IntelligenceApr-15-2020

Despite success in many real-world tasks (e.g., robotics), reinforcement learning (RL) agents still learn from tabula rasa when facing new and dynamic scenarios. By contrast, humans can offload this burden through textual descriptions. Although recent works have shown the benefits of instructive texts in goal-conditioned RL, few have studied whether descriptive texts help agents to generalize across dynamic environments. To promote research in this direction, we introduce a new platform, BabyAI++, to generate various dynamic environments along with corresponding descriptive texts. Moreover, we benchmark several baselines inherited from the instruction following setting and develop a novel approach towards visually-grounded language learning on our platform. Extensive experiments show strong evidence that using descriptive texts improves the generalization of RL agents across environments with varied dynamics.

agent, descriptive text, image text, (17 more...)

arXiv.org Artificial Intelligence

2004.072

Country:

North America > Canada > Ontario > Toronto (0.14)
North America > United States > California > Los Angeles County > Long Beach (0.04)
North America > Canada > British Columbia > Metro Vancouver Regional District > Vancouver (0.04)

Genre: Research Report (1.00)

Industry: Education > Curriculum > Subject-Specific Education (0.60)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Memory-Based Learning > Rote Learning (0.40)

Add feedback

AI Replaces Human Appraisers stardate 2019.420

#artificialintelligenceJun-7-2019, 15:06:44 GMT

What data actually matters for appraising a property? There is a long list of things to consider, this is a complicated process for humans and not much has changed with the process for decades. Something that human appraisers have struggled to consider are all of the unstructured elements on the property. Many of these topics have been too "subjective" for influence on your price estimate. Sure, if there are gross quality issues (damaged flooring, etc..) that can go into it, but your choice in tile for the backsplash?

artificial intelligence, main image, replace human appraiser stardate 2019, (12 more...)

#artificialintelligence

Country: North America > United States > Utah (0.16)

Industry: Banking & Finance > Real Estate (0.98)

Technology: Information Technology > Artificial Intelligence (0.49)

Add feedback

Looking Beyond Text: Extracting Figures, Tables and Captions from Computer Science Papers

Clark, Christopher Andreas (The Allen Institute for Artificial Intelligence) | Divvala, Santosh (The Allen Institute for Artificial Intelligence)

AAAI ConferencesMar-1-2015

Identifying and extracting figures and tables along with their captions from scholarly articles is important both as a way of providing tools for article summarization, and as part of larger systems that seek to gain deeper, semantic understanding of these articles. While many "off-the-shelf" tools exist that can extract embedded images from these documents, e.g. PDFBox, Poppler, etc., these tools are unable to extract tables, captions, and figures composed of vector graphics. Our proposed approach analyzes the structure of individual pages of a document by detecting chunks of body text, and locates the areas wherein figures or tables could reside by reasoning about the empty regions within that text. This method can extract a wide variety of figures because it does not make strong assumptions about the format of the figures embedded in the document, as long as they can be differentiated from the main article's text. Our algorithm also demonstrates a caption-to-figure matching component that is effective even in cases where individual captions are adjacent to multiple figures. Our contribution also includes methods for leveraging particular consistency and formatting assumptions to identify titles, body text and captions within each article. We introduce a new dataset of 150 computer science papers along with ground truth labels for the locations of the figures, tables and captions within them. Our algorithm achieves 96% precision at 92% recall when tested against this dataset, surpassing previous state of the art. We release our dataset, code, and evaluation scripts on our project website for enabling future research.

caption, machine learning, natural language, (20 more...)

AAAI Conferences

Workshops at the Twenty-Ninth AAAI Conference on Artificial Intelligence

Country: North America > United States > Massachusetts > Middlesex County > Cambridge (0.04)

Technology:

Information Technology > Artificial Intelligence > Machine Learning (1.00)
Information Technology > Artificial Intelligence > Natural Language > Text Processing (0.66)

Add feedback